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ABSTRACT 

One of the most adaptive immune responses is trig- 
gered by specific T-cell receptors (TCR) binding to 
peptide-major histocompatibility complexes (pMHC). 
Despite the availability of many prediction servers 
to identify peptides binding to MHC, these servers 
are often lacking in peptide-TCR interactions and 
detailed atomic interacting models. PAComplex is 
the first web server investigating both pMHC and 
peptide-TCR interfaces to infer peptide antigens 
and homologous peptide antigens of a query. This 
server first identifies significantly similar TCR- 
pMHC templates (joint Z-value>4.0) of the query 
by using antibody-antigen and protein-protein inter- 
acting scoring matrices for peptide-TCR and pMHC 
interfaces, respectively. PAComplex then identifies 
the homologous peptide antigens of these hit tem- 
plates from complete pathogen genome databases 
(>10 8 peptide candidates from 864628 protein se- 
quences of 389 pathogens) and experimental peptide 
databases (80057 peptides in 2287 species). Finally, 
the server outputs peptide antigens and homolo- 
gous peptide antigens of the query and displays de- 
tailed interacting models (e.g. hydrogen bonds and 
steric interactions in two interfaces) of hitTCR- 
pMHC templates. Experimental results demonstrate 
that the proposed server can achieve high predic- 
tion accuracy and offer potential peptide antigens 
across pathogens. We believe that the server is 
able to provide valuable insights for the peptide vac- 
cine and MHC restriction. The PAComplex sever is 
available at http://PAcomplex.life.nctu.edu.tw. 

INTRODUCTION 

An immune system protects an organism from diseases 
by identifying and killing pathogens (1). One of the most 



adaptive immune responses is triggered by specific T-cell 
receptors (TCRs) binding to peptide-major histocompati- 
bility complexes (pMHC) molecules. An increasing num- 
ber of available binding peptide antigens that are reliable 
(2-4) and high-throughput experiments that provide sys- 
tematic identification of pMHC interactions explain the 
growing requirement for fast and accurate computational 
methods for discovering homologous peptide antigens of 
a new peptide antigen and developing peptide-based 
vaccines for pathogens. 

Many methods have been proposed for predicting 
pMHC interactions. These methods can be roughly div- 
ided into the sequence-based methods such as motif 
matching (5,6), matrix methods [e.g. SYFPEITHI (7), 
MAPPP (8), IEDB (9)] and machine learning approaches 
[e.g. SVMHC (10)]; and structure-based approaches [e.g. 
PREDEP (11) and MODPROPEP (12)]. However, these 
methods are often lack of the TCR and pMHC binding, 
which is critical to trigger adaptive immune responses. 
Since the increasing number of TCR-pMHC crystal 
structures to investigate both pMHC and peptide-TCR 
interfaces provides further insights for understanding 
TCR-pMHC interactions and binding mechanisms. 
Additionally, discovering homologous peptide antigens 
(called peptide antigen family) to a known peptide antigen 
often provides a valuable reference for efforts to elucidate 
the functions of a new peptide antigen. 

To address these issues, we propose the PAComplex 
server for predicting TCR-pMHC interactions and inferr- 
ing antigen families across organisms of a query protein or 
a set of peptides. To our best knowledge, PAComplex is 
the first web server investigating both pMHC and peptide- 
TCR interfaces to infer peptide antigens and homologous 
peptide antigens of a query. Additionally, peptide antigen 
families are derived from a complete pathogen genome 
database (>10 8 peptide candidates from 389 pathogens) 
and experimental peptide databases to demonstrate the 
feasibility of the PAComplex server and increase the num- 
ber of potential antigens. Moreover, for a peptide antigen 
family, the amino acid composition and conservation are 
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evaluated at each position. Experimental results demon- 
strate that the server can improve the peptide antigen pre- 
diction accuracy and is useful for identifying peptide 
antigen families by using two interfaces of TCR-pMHC 
structures. Furthermore, the proposed server provides a 
valuable reference for efforts to develop peptide vaccines 
and elucidate MHC restriction and T-cell activation. 



METHOD AND IMPLEMENTATION 

Homologous peptide antigen 

The concept of homologous peptide antigen is the core of 
this server. We define the homologous peptide antigen (p') 
of the peptide (p) in template complex as follows: (i) p and 
p' can be bound by the same MHC forming pMHC and 
p'MHC, respectively, with the significant interface simi- 
larity (Z MH c> 1.645); (ii) pMHC and p'MHC can be 
recognized by the same TCR with significant peptide- 
TCR interface similarity (Z TCR > 1.645); and (iii) TCR- 
pMHC and TCR-p'MHC share significant complex simi- 
larity (joint Z>4.0). The joint Z- value (J z ) is defined as 

Jz = V Zmhc x Z TC r (1) 

The Z MH c and Z TC r of a TCR-p'MHC candidate with 
interaction score (E) can be calculated by (E-\i)/cr, where 
jlx is the mean and a is the standard deviation from 10 000 
random interfaces (Supplementary Figure SI). For a 
TCR-pMHC template collected from Protein Data 
Bank (PDB), these 10 000 random interfaces are generated 
by substituting with another amino acid according to the 
amino acid composition derived from UniProt (13). Here, 
Jz > 4.0 is considered a significant similarity according to 
the statistical analysis of 41 TCR-pMHC structure 
complexes; 80 057 experimental peptide antigens; and 
>10 peptide candidates derived from 864 628 protein 
sequences in 389 pathogens. 

Template-based scoring function 

We have recently proposed a template-based scoring func- 
tion to determine the reliability of protein-protein inter- 
actions derived from a 3D-dimer structure (14). For 
measuring the pMHC interaction score, the scoring 
function is defined as 

E tot = ^VDW+^SP+^sim (2) 

Where £Vdw and £sp denote steric force and special en- 
ergy (i.e. hydrogen bond energy and electrostatic energy), 
respectively, according to four knowledge-based scoring 
matrices (14) which have a good achievement between 
pMHC and protein-protein interactions. E sim refers to 
the peptide similarity score between p and p'. 

To model E yr>w and E SP of the peptide-TCR inter- 
actions, we developed a new residue-based matrix 
(Supplementary Figure S2) because the peptide-TCR 
interface resembles antigen-antibody interactions and dif- 
fers from protein-protein interfaces (15,16). The matrix is 
derived from anon-redundant set which consists of 62 
structural antigen-antibody complexes (including 131 
interfaces) constructed by Ponomarenko et al. (17). 



According to this matrix, the peptide-TCR (antigen- 
antibody) interface prefers aromatic residues (i.e. Phe, 
Trp and Tyr), which interact with aliphatic residues (i.e. 
Ala, Val, Leu, He and Met) or long side-chain polar resi- 
dues (i.e. Gin, His, Arg, Lys and Glu), to form strong van 
der Waals (VDW) forces (yellow boxes). Additionally, the 
scores are high if basic residues (i.e. Arg and Lys) interact 
to acidic residues (i.e. Asp and Glu). Conversely, the 
scores are low (purple box) when non-polar residues 
interact with polar residues. 

Overview 

Figure 1 shows the details of the PAComplex server to 
predict peptide antigens and search the template-based 
homologous peptide antigens of a query protein sequence 
(or a set of peptides) by the following steps (Figure 1A). 
The server initially divides the query protein sequence into 
fix length (ranging from 8 to 13) peptides based on selected 
MHC class I allele and templates. Each peptide (p') is then 
aligned to the bound peptide (p) of TCR-pMHC tem- 
plates collected from PDB. Next, the peptide antigen is 
examined by utilizing the template-based scoring function 
to statistically evaluate the complex similarity (J z >4.0) 
between TCR-pMHC and TCR-p'MHC (Figure IB and 
C). For each peptide antigen, the server introduces the 
potential TCR-pMHC binding models and the detailed 
residues interactions (e.g. hydrogen bonds and VDW 
forces) of pMHC and peptide-TCR interfaces (Figure ID). 
For the hit templates, the server identifies the homologous 
peptide antigens with / z >4.0 from an experimental 
peptide database (80 057 peptides in 2287 species) and a 
complete pathogen genome database (>10 8 peptide antigen 
candidates with J z > 1.645 derived from 864 628 protein 
sequences of 389 pathogens) (Figure IB and E). For a 
peptide antigen family, we measure the amino acid com- 
position and conservation at each position (Figure IF) by 
WebLogo program (18). Finally, this server provides 
peptide antigens, visualization of the TCR-pMHC inter- 
action models, and peptide antigen families with 
conserved amino acids. 



INPUT, OUTPUT AND OPTIONS 

The PAComplex server is easy to use (Figure 2). Users 
input a protein sequence in FASTA format (or a set of 
fix-length peptides) and select the parameters (e.g. MHC 
class I allele and templates) (Figure 2A). The PAComplex 
server typically infers peptide antigens and homologous 
peptide antigens of the query within 4 s if the sequence 
length is <300. For a query, PAComplex shows the 
detailed atomic interactions and binding models using 
Jmol and amino acid profiles (Figure 2C) of homologous 
peptide antigens from experimental peptide (Figure 2E) 
and complete pathogen genome databases (Figure 2D). 
For each peptide antigen, PAComplex also presents the 
source proteins, organisms and experimental data. In 
addition, users can download summarized results of 
query peptides or protein sequences, the modeling TCR- 
pMHC complex, template structure of TCR-pMHC and 
peptide family of the template. 
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Step 1 : Query a protein sequence or a set of 
peptides. 






Step 2: Identify peptide antigen candidates 
of the query based on the significant 
complex similarity (joint Z- value > 4 ) using 
TCR-pMHC structural templates. 







Step 3: For each peptide antigen candidate, 
the server provides the binding models of 
peptide-MHC and peptide-TCR interfaces 
including the contacting residues (e.g. 
hydrogen bond and van der Waals force). 



Step 4: For the peptide of a hit template, we 
identify homologous peptide antigens, using 
peptide-MHC and peptide-TCR interface 
similarities (Z- value > 1.645) and the TCR- 
pMHC complex similarity (joint Z- value Jz> 
4), from experimental peptide and complete 
pathogen genome databases. 



Step 5: Measure the amino acid 
compositions and present conserved 
interacting residues of the peptide family. 



Step 6: Output binding models, homologous 
peptide antigens, conserved amino acids, 
multiple peptide alignment across species for 
the query. 



B 



Protein P (P03155 , 750 residues) 



41 TCR-pMHC 
structural template 
database 



Experimental 
peptide database 
(80,057 peptides) 



Complete pathogen 
genome database 
(864,628 sequences in 389 
species) 



peptide antigen candidates of the query P 

SYQRFRRL 4.99 
I ILGFRKI 4.6 
KPPSFPNI 4.12 



MPLSYQRF 
LVVDFSQF 
FYPNFTKY 



3.74 
3.7 
3.42 




Peptide family (J z > 4) 


1 Peptide family (J z > 4) 


(Experimental peptides) 

SIINFEKL 

SSIEFARL 

STLNFNNL 

RSIDFERV 

— . SVIKFENL 


(Complete pathogen genome) 

SIINFEKL 

SSIEFARL 

STLNFNNL 

RSIDFERV 

■ SVIKFENL 



A.A. composition 

l 




Figure 1. Overview of the PAComplex server for peptide antigens and homologous peptide antigens search using protein P of HBV as the query. (A) 
Main procedure. (B) Template-based scoring function to infer the peptide antigens and homologous peptide antigens through structural templates, 
experimental peptides and complete pathogen genome databases. (C) Peptide antigen candidates of the query using hit TCR-pMHC complex 
templates. (D) Atomic binding models with hydrogen bonds (green dash lines) of both pMHC and peptide-TCR interfaces. (E) Peptide antigen 
families of the query from the experimental peptide and complete pathogen genome databases. (F) Amino acid compositions (profiles) of the 
homologous peptide antigens. 



Example analysis 

Protein P of hepatitis B virus. While affecting over 350 
million people worldwide, hepatitis B virus (HBV) infec- 
tion is a leading cause of liver diseases and hepatocellular 
carcinoma (19,20). Figure 1 shows the PAComplex 
derived results using protein P [UniProt (13) accession 
number: P03155, 750 residues divided into 743 8-mer 
peptides] of HBV genotype D as the query. Protein P, a 
multifunctional enzyme, converts the viral RNA genome 
into dsDNA in viral cytoplasmic capsids. This enzyme 
displays a DNA polymerase activity that can replicate 
either DNA or RNA templates, and a ribonuclease H 
(RNase H) activity that cleaves the RNA strand of 
RNA-DNA heteroduplexes in a partially processive 3'- 
to S'-endonucleasic mode (21,22). For this query, the 
PAComplex server found three hit peptide antigen 



candidates (/ z >4.0; Figure 1C) and 73 homologous pep- 
tide antigens in 21 organisms (Supplementary Figure S3 A) 
by using H-2K b -peptide-TCR template [PDB entry 3 
CVH (23)] and the experimental peptide database. 
Among these three hit peptides, the peptide 497-504 
(IILGFRKI) recorded in IEDB (4) is the epitope of 
protein P and PAComplex presents its binding models 
and detailed residue interactions of peptide-TCR and 
pMHC interfaces (Figure ID). Position lof the homolo- 
gous peptide antigens prefers the polar residues (e.g. Ser, 
Thr, Arg and Lys; Figure IF) and the first position of this 
peptide (pink) is polar residue Ser forming five hydrogen 
bonds with residues Tyr7, Glu63 and Tyrl71 on MHC 
molecule (green) (Figure ID). Additionally, position 7 
(Lys) of the homologous peptide antigens prefers the 
positive residues (Arg and Lys, Figure IF) and Lys7 of 
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Enter FASTA sequence (example 1) (example 2) 



>sp|Q50306|RL5_MYCPN SOS ribosomal protein L5 OS=Mycoplasma 
pneumoniae GN=rplE PE=3 SV=1 

MNNLKAHYQKTIAKELQKSFAFSSIMQVPRLEKIVINMGVGDAIRDSKFLESALNELHLI 
SGQK P VATKAKNA 1ST YKLRAGQL I GCKVT LRGE RMWAFLEKL I YVAL PRVRDFRGLSLK 
S FDGRGNYT I G I KEQI I F PE I VYDDIKRI RG FDVTLVT STNKDSEALALLRALNL PLVKG 



B 



Summary of query protein: 

Protein : sp|Q50306|RL5_M 
Number of query : 172 peptides 

Hit peptides : 1 peptide (Z-score : 



) 



Peptide Candidates 



Rank Peptide po oluu 

1 (hit) RMWAFLEKL 95 

Z-score cutoff : 



Best 
Template 
2j8u 



Options : 

Interacting 
type 
Interface 
chain color 




strong VDW 
* both* al 
■ MHC 0TCR 
MHC : A TCR:EF 



The both option means the srong VDW and H-bond 



Current selected template complex (interface) 

Query : rmwaflekl (Rank 1, position 95) 

Template : alwgffpvl (PDB: 2j8u) 

MHC allene : HLAA0201 

No. of contact pairs : 65 pairs 

No. of H-bond : 8 pairs 

No. of strong VDW : 22 pairs 

Ot ^ a of? ble ^ mp ! at t * 2j8u/5.26 ~ 
(PDB / Joint Z-value) • 




2 3 



4 5 6 7 8 9 







MHC molecule 






molecules 


Template 
a a 


Peptide 
a a 


a a 


Bond type 


Chain 


a a 


Bond type 


A1 


R 


Y1 59 
M5 
Y59 

W167 
Y7 

Y171 


H-bond 
strong VDW 
strong VDW 
strong VDW 
H-bond 
H-bond 








L2 


M 


F9 

ES3 
M45 
V67 


strong VDW 

H-bond 
strong VDW 
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W3 


VV 


Y99 
L156 
Q155 
H70 


H-bond 
strong VDW 
strong VDW 

H-bond 








G4 


A 












F5 
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V152 


strong VDW 


E 
F 


F93 
V98 


strong VDW 
strong VDW 


F6 


L 


ASS 


strong VDW 


F 
E 


Y32 
F99 


strong VDW 
strong VDW 


P7 


E 


W147 
Y116 


strong VDW 
strong VDW 


- 


W97 


strong VDW 


V8 


K 


W147 


H-bond 


F 


W97 


strong VDW 


L9 

E 


_ 


1124 
L8* 
D77 
V95 
W147 


strong VDW 
strong VDW 

H-bond 
strong VDW 
strong VDW 









Summary of peptide family of: Experimental peptide 

Query : rmwaflekl (Rank 1, position 95) 
Template : alwgffpvl (PDB: 2j8u) 
MHC allene : HLAA0201 
No, of members : 66 peptides 

No. of organism : 32 

Show 10 » entries Search: 

Peptide Antigen Family of Template 2j8u across Multiple Species 

Peptide 



Summary of peptide family of: Complete pathogen 



Query : rmwaflekl (Rank 1, position 95) 
Template : alwgffpvl (PDB: 2j8u) 
MHC allene : HLAA0201 
No. of members : 162122 peptides 
No. of organism : 356 



Rank 



Average 
complex 



Organisms 



Show 25 - entries 

Peptide Antigen Family of Template 2j8u in 

Peptide Antigen Joint Z-value » 



Search: 



Mycoplasma pneumoniae 

Source molecule 
Protein Start position 



7.265 
5 407 
5.397 



Homo sapiens 
Homo sapiens 
Vaccinia virus WR 



5 132 



Mycobacterium 



chromosome 15 open 
reading frame 24 
L-dopachrome 
tautomerase precursor 
Major core protein P4a 
precursor 



IEDB 

EDO 
IEDB 



KMLNFAPNL 
SFFAFFVKA 
FFLGFFNRI 



5.403 
5.378 
5-287 



P75390 
P75392 
P75198 



RMWAFLEKL 



5.262 



Q50306 



50S ribosomal protein L5 



EDI 



SFWFFHPPY 
LFHAFFGAL 



5.116 
5 031 

5.013 
4.946 



9 SVYDFFVHL 4.854 

10 AMTAFFGEL 4.846 
Showing 1 to 10 of 66 entries 



Early transcription factor 

82kDa subunit 
Melanoma-associated 
antigen 3 

Campylobacter jejuni unknown 
Guanarito virus L protein 

Homo sapiens ttu ,^?pZ,so- IEDB 
vaccinia virus WR Late protein H7 IEDB 

]«0[3]H[I][Ne*][^] 



Vaccinia virus WR 
Homo sapiens 



IEDE 

EDG 

IEDB 
IEDB 



First 



AMLEFLPDT 
KFVAFFKSL 
RLFEYFRKI 

FIWAFFISI 

KRFSFFQAL 
AMIGFYANL 
KLFYFFTLL 
VNQGFFPKL 



5.226 
5.224 
5.168 

5.156 

5.101 
5.072 
5.057 
5.037 



P75556 
P75233 
P75471 



P75455 
P75055 
Q50361 
P75066 



Figure 2. PAComplex server search results using 50S ribosomal protein L5 (rplE) of M. pneumonia as the query. (A) User interface for inputting the 
query protein sequence, MHC class I allele, and templates. (B) The peptide antigen candidates (/ z >4.0) of the query. (C) Detailed atomic inter- 
actions and binding models with hydrogen bond residues and strong VDW forces. Peptide antigen families from (D) experimental peptide database 
and (E) complete pathogen genome database, respectively. 
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this hit peptide forms electrostatic interactions with Asp49 
in TCR. 

Two other hit peptide antigens 4-1 1 (SYQRFRRL) and 
75-82 (KPPSFPNI) correlate well two homologous 
peptide antigens (SYQHFRKL and KTPSFPNI), which 
are epitopes of HBV alpha 1 recorded in IEDB, respect- 
ively (orange box in Supplementary Figure S3 A). 
According to the amino acid composition (profile) of 
this peptide antigen family (Figure IF), position 7 
prefers positive charged residues and positions 5 and 8 
prefer non-polar residues. Conversely, the compositions 
of Positions 2 and 4 are diverse. These two hit antigens 
match the profile of the antigen family on Positions 1 
(polar residues), 5 (conserved residue Phe), 7 (positive or 
polar residues) and 8 (non-polar residues). For instance, 
the two hit antigens are conserved on the Position 5 with 
residue Phe forming strong VDW interactions with MHC 
[Phe74, Val97, Tyr22, Val9 and Tyrll6) and TCR 
(Phe 104) molecules (Supplementary Figure S3B)]. These 
residue-residue interactions (i.e. Phe-Phe, Phe-Val and 
Phe-Try) are high scores according to pMHC (14) and 
peptide-TCR (Supplementary Figure S2) scoring 
matrices. For peptides SYQRFRRL and SYQHFRKL, 
they have the positively charged residue type (e.g. Arg 
and Lys) on Position 7 and different residue types on 
Position 4. For peptides KPPSFPNI and KTPSFPNI, 
the only different residue type is located on Position 2. 
Therefore, these two hit peptides are potential peptide 
antigens. These results suggest that investigating multiple 
TCR-pMHC interfaces and the peptide antigen family are 
useful for predicting peptide antigens and providing 
valuable insight into MHC restriction and T-cell 
activation. 

50 S ribosomal protein L5 (rplE) of Mycoplasma 
pneumonia. 50S ribosomal protein L5 (rplE), interacting 
with 5S rRNA and tRNA, is an essential protein of 
M. pneumonia, which is the cause of human walking pneu- 
monia (24). Based on use of the M. pneumonia rplE (Q50306, 
180 residues are divided into 172 9-mer peptides) as the 
query (Figure 2A), the PAComplex server infers one hit 
candidate (J z > 4.0; Figure 2B), 95-103 (RMWAFLEKL) 
and its 66 homologous peptide antigens in 32 organisms, 
based on the HLA-A0201 -peptide-TCR template [PDB 
entry 2J8U (25); Figure 2D]. This server provides the 
binding model (Figure 2C) and homologous peptide 
antigens via the experimental peptide database and 
complete pathogen genome database (Figure 2D and E). 

The hit candidate is similar to the Rank 4 peptide 
(RMWEFLDRL, red box) in the peptide family 
(Figure 2D). But they have three different amino acid 
types on Positions 4, 7 and 8 whose amino acid compos- 
itions of this family are diverse (Figure 2C). Based on 
binding models and interactions, Position 4 lacks any 
hydrogen bonds and strong VDW contacts. On the 
other hand, the hit peptide correlates well with the 
amino acid profile on the conserved positions (i.e. 1, 2, 
3, 5, 6, 9) forming strong VDW forces (the right-side 
table of Figure 2C) based on the interactions of both 
peptide-TCR and pMHC interfaces in HLA-A0201- 
peptide-TCR template (2J8U). Above results imply that 



the hit peptide of rplE is a potential antigen-activating 
immune response. Furthermore, PAComplex provides 
the potential peptide antigens derived from all proteins 
of the query M. pneumoniae (Figure 2E) and the other 
388 pathogens using a complete pathogen genome 
database. These potential peptide antigens across patho- 
gens can be useful in identifying specific peptides for the 
target pathogen for vaccine design. 



RESULTS 

To evaluate the performance of PAComplex for identify- 
ing the peptide antigens and peptide antigen families, we 
selected two peptide sets, termed BothMT (Figure 3A) and 
CPD (Figure 3B). BothMT consists of 86 positive and 67 
negative octamers with experimental data of both H-2K b 
and TCR sides collected from IEDB (4). PAComplex 
aligned these 153 peptides tosix H-2K b -peptide-TCR 
complex templates extracted from PDBreleased on 25 
December 2010 to evaluate the accuracies of scoring func- 
tions on variant conditions (e.g. single template, multiple 
templates, single side and both sides). The CPD set, which 
comprises >10 8 peptide candidates (/ z > 1.645) derived 
from 864 628 protein sequences of 389 pathogens, was 
used to evaluate the reliability of homologous peptide 
antigens and it was collected by the following steps: 
(i) extract 389 pathogens (e.g. bacteria, archaea and 
virus) recorded in both IEDB and UniProt (13) databases 
and their respective complete genomes collected from 
UniProtdatabase (13); (ii) derive the positive and negative 
data sets from IEDB for these pathogens; and (iii) extract 
41 TCR-pMHC complexes from PDB. 

Figure 3A illustrates the receiver operating 
characteristic (ROC) curves (i.e. true positive and false 
positive rates) of our scoring functions on single and 
multiple templates using one interface (i.e. pMHC and 
peptide-TCR interfaces) and two interfaces (i.e. TCR- 
pMHC complex). We observed several interesting 
results: (i) the scoring function using pMHC interface 
(blue lines) yields a higher accuracy than using peptide- 
TCR interface (green lines); (ii) using multiple templates 
(solid lines) is better than using single template (dot lines); 
and (iii) using two interfaces with multiple templates (red) 
is the best among these six combinations. 

Next, the J z threshold for reliable homologous peptide 
antigens is determined by evaluating the PAComplex 
server on the large-scale CPD data set (Figure 3B).This 
server was tested on >10 10 peptides derived from 864 628 
protein sequences of 389 pathogens. Among these pep- 
tides, over 10 8 peptide candidates with J z > 1.645 were se- 
lected for analyzing the relationships between J z values 
with both the numbers of positive homologous peptide 
antigens (blue, recorded in IEDB) and precision (red). 
When J z is higher than 4.0, the precision >0.8 and the 
number of positive antigens exceeds 1600 according to the 
positive and negative data sets. If the J z threshold is set to 
4.0, the total number of inferring possible peptide antigens 
surpasses 4 000 000 (Supplementary Figure S4) statistically 
derived from 41 TCR-pMHC complexes. The amino acid 
compositions (profiles) of these 1600 positive peptide 
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-A- Single Template (MHC) 
Single Template (TCR) 
--©- Single Template (Both) 
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Figure 3. Evaluations of the PAComplex server on BothMT and CPD sets. (A) ROC curves of PAComplex using single and multiple templates with 
one interface (pMHC and peptide-TCR) and two interfaces of TCR-pMHC complex using BothMT set. (B) Relationship between the distribution of 
positive hits (blue line) and precision values (red line) with different joint Z-value thresholds using CPD set. 



antigens closely correspond to the ones obtained from 
peptide antigen families (4000 000 antigens). These experi- 
mental results demonstrate that this server achieves high 
accuracy and is able to provide potential peptide antigens 
across pathogens. 



CONCLUSIONS 

This work demonstrates the feasibility of using the 
PAComplex server to identify peptide antigens and hom- 
ologous peptide antigens. The proposed server provides 
detailed atomic interactions, binding models, amino acid 
compositions of peptide families, source proteins and or- 
ganisms and experimental data. PAComplex server is 
the first to infer peptide antigens and homologous 
peptide antigens by considering two TCR-pMHC inter- 
faces from complete pathogen genome and experimental 
peptide databases. Experimental results demonstrate that 
the server is highly accurate and capable of providing 
potential peptide antigens across pathogens. We believe 
that PAComplex is a fast homologous peptide antigens 
search server and is able to provide valuable insights 
into the peptide vaccine, MHC restriction and T-cell 
activation. 
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